7 - Representation Learning for Pathological Speech Modeling [ID:12835]
50 von 112 angezeigt

Thank you for the kind introduction. So today I'm going to talk about my recent work on

representation learning to model and to classify pathological speech. So I am in the lab since

May 2018 so this is my fourth PRS and since last PRS in summer I have some publications,

these are some of them. I have the paper at C-ARP which has the best paper award, my recent

paper at Interspeech, my accepted paper at ICASP. We already participated last year in

the Interspeech hackathon creating Alexa skills so this is why for this presentation an Alexa

skill is going to help me during my talk. And today I'm going to talk to you about this

recent paper submitted to speech communication about parallel representation learning for

classification of pathological speech. So pathological speech processing has been focused

in different diseases with different origins like patients with larynx, cancer, polyps,

nodules, patients with morphological diseases like cleft lip and palate or patients with

neurological diseases like Parkinson's disease, Huntington disease or Alzheimer's. Clinical

observations in the speech of patients can be measured and objectively analyzed with

the aim to address two main problems. The first one is to support the diagnosis of the

disease by classifying between healthy control subjects and patients and the second one,

once the patient is diagnosed is to predict the level of degradation of the speech of

the patients according to a specific clinical scale to measure intelligibility, articulation

among others. So my main aim is that general handcraft features extracted in the related

studies may not capture enough information to characterize the presence of pathological

disorders that affect different aspects of the speech production system. Classical features

addressed in the literature include phonation measures regarding perturbation of the vocal

fold vibration, articulation measures regarding formant frequencies, different resonances

in the vocal tract, prosody features regarding fundamental frequency, energy, speech rate

disturbance among others and intelligibility measures based on the water rate. So current

trends in pathological speech modeling can be divided or I divided into three main aspects.

The first one is based on speaker models using Gaussian mixture models, I-vectors or recent

X-vectors. Some of these models were already used in my recent paper at ICASB. The second

one is phonological features regarding the estimation and prediction of posterior probabilities

for phonological classes like plosives, nasals, fricatives among others and the third one

which is the method I would like to talk to you today about representation learning strategies.

It's mainly about embeddings derived from a neural network to represent pathological

speech signals. Their methods are inspired mainly from the natural language processing

community. So, Alexa is going to help me during this slide. Alexa, open presentation. Open

presentation. Hi, what do you want to know about? Methods. We proposed a novel strategy

based on unsupervised representation learning for automatic classification of pathological

speech. For such a purpose, we trained recurrent and convolutional autoencoders to extract

informative features to characterize the presence of speech disorders. We additionally proposed

a novel feature set based on the reconstruction error of the autoencoders. We think this can

be very good. So, thank you, Alexa. So, we propose...stop. So, the first case, we propose

a convolutional autoencoder with the aim to map the spatial distribution of the energy

which is present in a spectrogram. The input of the autoencoder is a male-scale spectrogram

divided in 128 filters and a time frame of 500 milliseconds. So, we consider the bottleneck

representation here to provide a suitable representation to reconstruct the input spectrogram.

And in the second hand, we have a recurrent autoencoder with the aim to model the temporal

evolution of the spectral components that are present in a speech frame. The input is

the same as in the previous case. It's a male-scale spectrogram with 128 filters and a time frame

of 500 milliseconds. And in this case, we consider as well the bottleneck features to

represent the speech signal. From both spectrograms, we propose two different feature sets to evaluate

the presence of speech disorders. The first one, as classically addressed, is the bottleneck

features. But we propose an additional feature set based on the mean square reconstruction

Teil einer Videoserie :

Presenters

M. Sc. Juan Camilo Vasquez Correa M. Sc. Juan Camilo Vasquez Correa

Zugänglich über

Offener Zugang

Dauer

00:13:16 Min

Aufnahmedatum

2020-02-17

Hochgeladen am

2020-02-17 18:36:53

Sprache

en-US

Tags

reconstruction tasks autoencoder features classification validation frequency bottleneck fusion lab obtained average speech patients databases
Einbetten
Wordpress FAU Plugin
iFrame
Teilen